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Abstract 

The cultivated strawberry (Fragaria x ananassa) is an octoploid (2n = 8x = 56) of the Rosaceae family 
whose genomic architecture is still controversial. Several recent studies support the AAA'A'BBB B' model, 
but its complexity has hindered genetic and genomic analysis of this important crop. To overcome this 
difficulty and to assist genome-wide analysis of F. x ananassa, we constructed an integrated linkage map 
by organizing a total of 4474 of simple sequence repeat (SSR) markers collected from published Fragaria 
sequences, including 3746 SSR markers [Fragaria vesca expressed sequence tag (EST)-derived SSR 
markers] derived from F. vesca ESTs, 603 markers (F. x ananassa EST-derived SSR markers) from F. x ana- 
nassa ESTs, and 125 markers (F. x ananassa transcriptome-derived SSR markers) from F. x ananassa 
transcripts. Along with the previously published SSR markers, these markers were mapped onto five 
parent-specific linkage maps derived from three mapping populations, which were then assembled into 
an integrated linkage map. The constructed map consists of 1856 loci in 28 linkage groups (LGs) that 
total 2364.1 cM in length. Macrosynteny at the chromosome level was observed between the LGs of 
F. x ananassa and the genome of F. vesca. Variety distinction on 1 29 F. x ananassa lines was demonstrated 
using 45 selected SSR markers. 
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1. Introduction 

The cultivated strawberry {Fragaria x ananassa) is 
one of the most popular and globally consumed 
fruit crops. It is cultivated in various regions through- 
out the world; in 201 0, 39% of the cultivated straw- 
berry crop was produced in North and South 
America, followed by Europe (33%), Asia (1 8%), and 



Africa (9%). 1 Because of the economic importance 
of this species, breeding programmes of cultivated 
strawberry are underway in many countries. To date, 
more than 3000 varieties bred in 41 countries have 
been registered in the International Union for the 
Protection and New Varieties of Plants variety data- 
base. 2 However, despite intensive use of F. x ananassa 
in the industry, the progress of genetic and genomic 
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research of this crop has lagged behind that of many 
other economically important plant species, because 
of its complex genome structure. 

Fragaria x ananassa is an octoploid {In = 8x= 56) 
species that originated from a natural hybridization 
between Fragaria virginiana and Fragaria chiloensis. 3 
The genome composition of F. x ananassa was initially 
proposed as AABBBBCC 4 or AAA'A'BBBB, 5 based on 
results of cytological studies. Later, Bringhurst 6 pro- 
posed the AAA'A'BBB'B' model in light of cytological 
and genetic evidence. Unlike the first two models, 
the last model suggests disomic inheritance in entire 
chromosomes. Although the genome composition of 
F. x ananassa has not yet been confirmed, the 
AAA'A'BBB'B' model has been supported by recent 
molecular genetic studies. 7-11 The genus Fragaria 
belongs to the Rosaceae family. Unlike other genera 
in the Rosaceae family, Fragaria comprises a limited 
number of species (approximately 22). 12 Several 
species have been nominated as candidate diploid 
ancestors such as Fragaria vesca, Fragaria iinumae, 
and Fragaria daltoniana} 3 '^ 4 Of these candidates, 
F. vesca is considered to be the most likely diploid ances- 
tor, and it serves as a model species of F. x ananassa.^ 4 
Therefore, genomic and genetic studies in F. vesca have 
been performed prior to those of F. x ananassa. 

In molecular genetics studies, severaltypes of F vesca 
sequence-derived DNA markers, such as simple se- 
quence repeat (SSR) and sequence-characterized amp- 
lified region (SCAR) markers, have been developed 
since 2003 15 " 19 (in this article, 'DNA marker' is 
defined as 'unique DNA sequence(s) identified by spe- 
cific primers'). Linkage maps were then constructed 
with a population derived from an interspecific cross 
between F. vesca and Fragaria nubicola. 20 ~ 23 In 
genomic studies, bacterial artificial chromosome and 
fosmid libraries were constructed to investigate 
features of genome sequences of F. vesca. 24,25 
Subsequently, a whole genome sequence for F. vesca 
was published in 201 1, and the results have greatly 
contributed to advances in genomic and genetic ana- 
lysis of the genus Fragaria. 26 In parallel, comparative 
genomics studies within the Rosaceae family have 
also been performed. Cabrera et al. 27 developed a 
total of 1039 Rosaceae-conserved orthologous set 
(RosCOS) markers. Primer pairs of the RosCOS 
markers were designed on intron-flanking regions of 
orthologous genes commonly conserved among the 
genera Mains, Fragaria, and Prunus and mapped onto 
a Prunus reference linkage map. After the genome se- 
quence in F. vesca was published, 26 whole genome 
comparisons between Fragaria, Prunus, and Mains 
were performed, and major orthologous regions were 
identified across the genus. 28 

In F. x ananassa, the first linkage map was con- 
structed with 235 or 285 amplified fragment length 



polymorphism (AFLP) markers on 28 or 30 linkage 
groups (LGs) based on a 2-way pseudo-testcross strat- 
egy. 29 Several linkage maps were subsequently con- 
structed with AFLP, SCAR, sequence-tagged site, 
random amplified polymorphic DNA, and SSR 
markers. 8-1 1,30 All the previously published linkage 
maps were generated based on single Ft mapping 
populations, and integrated linkage maps were devel- 
oped in each mapping population in all four 
studies. 8,1 0,11,30 Of the previously developed linkage 
maps, the densest map was constructed by Sargent 
et al. 10 ; a total of 549 loci were mapped onto 28 
LGs, with a total length of 21 40.3 cM. The density of 
the map was greatly enhanced over previous maps, 
but several unsaturated LGs were still observed such 
as the un integrated LG pair of FG2DA and FG2DB 
and LG6B that contained large gaps (36.7 cM). 
These results suggest the need to develop a denser 
linkage map in F x ananassa to reveal the complex 
genome structure of this species. 

In this study, we performed SSR marker development 
and constructed an integrated high density linkage 
map to accelerate the advancement of genomic and 
genetic studies in F. x ananassa. Three types of SSR 
markers were developed, namely F. vesca expressed se- 
quence tag (EST)-derived SSR markers (FVES markers), 
F. x ananassa EST-derived SSR markers (FAES markers), 
and F. x ananassa transcriptome-derived SSR markers 
(FATS markers), using public genome sequence data. 
The markers were mapped onto five parent-specific 
linkage maps, along with previously published DNA 
markers, and the five parent-specific maps were then 
integrated into one map. The applicability of the inte- 
grated map and the markers developed in this study 
were also demonstrated by comparative analysis of 
F. x ananassa and F. vesca and by variety distinction 
on 1 29 F. x ananassa lines with selected markers. The 
markers and integrated linkage map described in this 
study are valuable resources for future studies that 
will help to elucidate the genome structure and evolu- 
tionary process in F. x ananassa and whole genome se- 
quencing, genetic mapping, and molecular breeding of 
this species. 

2. Materials and methods 

2.1. Plant material 

An integrated linkage map was constructed using 
three mapping populations originating from a total 
of five parental lines. '02-19' x 'Sachinoka' is an F, 
mapping population of 1 88 individuals derived from 
a one-way pseudo-testcross. The female parent '02- 
19' is a breeding line developed in Chiba Prefectural 
Agriculture and Forestry Research Center, which is re- 
sistant to powdery mildew and Fusarium wilt. The 
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male parent 'Sachinoka' is a Japanese variety bred in 
the National Agricultural Research Center for Kyusyu 
and Okinawa Region, which was derived from a 
cross between Toyonoka' and 'Aiberry'. 'Kaorino' x 
'Akihime' consisted of 140 Ft individuals derived 
from a one-way pseudo-testcross between the paren- 
tal lines. 'Kaorino' and Akihime' were bred in Mie 
Prefecture Agricultural Research Institute and by Mr 
Kazuhiro Ogiwara in Japan, respectively. One of the 
ancestor lines of 'Kaorino' is Akihime'. 'Kaorino' exhi- 
bits crown rot resistance, whereas Akihime' is suscep- 
tible to the disease. The '0212921' mapping 
population consisted of 1 69 S, individuals of 
'0212921'. '0212921' was generated from a cross 
between two breeding lines developed in Mie 
Prefecture Agricultural Research Institute, and the 
male parent of '021 2921' was identical to the 
female parent of 'Kaorino'. 

2.2. Development of EST-derived SSR markers 

SSR regions were identified from the F. vesca and F. x 
ananassa ESTs registered in public databases, namely 
NCBI's nucleotide database (http://www.ncbi.nih. 
gov). The numbers of ESTs that were investigated 
include 45 739 and 61 1 7 in F. vesca and F. x ananassa, 
respectively. SSRs longerthan 1 4 bases, which contained 
all possible combinations of dinucleotide (NN), trinu- 
cleotide (NNN) and tetra nucleotide (NNNN) repeats, 
were identified using the FINDPATTERNS module in 
the GCG software package (Accelrys Inc., USA). 
Oligonucleotides for polymerase chain reaction (PCR) 
primers were designed based on the flanking regions 
of the identified SSRs using the Primer3 program 31 in 
such a way that the amplified products ranged from 
90 to 300 bp in length. Markers corresponding to 
previously published SSR markers in the Fragaria spp 
and RosCOS of markers 8 ' 1 0,1 5,1 7 - 23 > 27 . 32 -39 were iden . 

tified based on primer sequences and were excluded 
from the collection of FVES and FAES markers. 

2.3. Development of transcriptome-derived SSR 
markers 

A total of 1 188 226 F. x ananassa transcript se- 
quence reads registered in NCBI's Sequence Read 
Achive (SRA, Accession number: SRX1 6008) were 
used for identification of the SSR regions. All reads 
were generated using the Roche 454 sequencing 
system (Roche, Basel, Switzerland). The MIRA 3.2.0 
program was used for assembling non-redundant 
contigs. 40 Methods for identification of SSR regions 
and primer design were the same as those used for 
FVES and FAES marker development, except that 
penta- and hexanucleotide repeats were also identified 
and used for primer design. After designing the primer 
sequences, the redundancy between the FVES, FAES, 



and published markers described in the above section 
was confirmed. The newly developed markers were 
designated FATS markers. 

2.4. Polymorphic analysis of the DNA markers 

DNA was extracted from the young leaves of plants 
using a DNeasy Plant Mini Kit (Qiagen Inc., Germany). 
DNAquantification and quality checks we re performed 
using an ND1000 NanoDrop spectrophotometer 
(Nanodrop Technologies, DE, USA) and 0.8% agarose 
gels. In addition to the FVES, FAES, and FATS markers, 
a total of 1 1 1 4 primer pairs of previously published 
SSR markers developed from Fragaria sequences and 
RosCOS markers were used for polymorphic analysis 
of the '02-1 9' x 'Sachinoka' mapping population 
(Supplementary Table S1). 8 ' 1 s,i 7-^1,23,27,32-38 
Polymorphic analysis of the other two populations 
was performed without using published markers. PCR 
was performed in a 5-|xl reaction volume using 0.6 ng 
of genomic DNA in 1X PCR buffer (Bioline, UK), 3 mM 
MgCl 2 , 0.08 U of BIOTAQ DNA polymerase (Bioline, 
UK), 0.8 mM dNTPs, and 0.4 |jlM of each primer. 
A modified touchdown PCR protocol was followed as 
described bySatoetal. 41 The PCR products were sepa- 
rated by 10% polyacrylamide gel electrophoresis 
in tris-borate-ethylenediaminetetraacetic acid (TBE) 
buffer according to the standard protocol or with an 
ABI 3730x1 fluorescent fragment analyzer (Applied 
Biosystems, USA), according to the polymorphic frag- 
ment sizes of the PCR amplicons. In the former case, 
the data were analysed using the Polyans software 
(http://www.polyans.kazusa.or.jp), whereas in the 
latter case, polymorphisms were investigated using 
the GeneMapper software (Applied Biosystems, USA). 

2.5. Linkage analysis 

In this study, it was assumed that F. x ananassa is an 
allo-octoploid species and that polymorphic loci in 
entire chromosomes showed disomic inheritance. 
Therefore, linkage analysis was performed using the 
methodology employed with diploid and outcrossing 
species. The segregated data of the two pseudo-test- 
cross mapping populations were categorized into 
two parent-specific data sets by comparing the sizes 
of polymorphic bands of the parents and progeny. 
The segregation data were rescored using the 'HAP1 ' 
or 'F2' population type codes employed in JoinMap 
analysis. 42 'HAP1 ' codes were used for the four paren- 
tal lines of the '02-1 9' x 'Sachinoka' and 'Kaorino' x 
Akihime', whereas 'F2' codes were employed for the 
parents of the population '021 2921'. As a result, 
a total of five parent-specific data sets were generated. 
The segregation data from each parent-specific data 
set were then classified into multiple LGs using the 
colour map method 43 that employed a comparison 
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of graphical genotypes of the segregation data. During 
the process of colour mapping, reciprocal genotypes 
were converted to coupling genotypes. The robustness 
of the data sets for each LG was then confirmed using 
the Grouping Module of the JoinMap program, version 
4, with a logarithm of odds (LOD) threshold of 10. 
The homeologous LGs within each parent-specific 
map were assumed based on corresponding positions 
on the F. vesca genome, which were predicted by com- 
parative analysis (described below). 

The locus orders in each parent-specific map were 
calculated using the Regression Mapping Module of 
JoinMap. '021 2 921 '-specific data were handled as 
an 'F2' population type, whereas other parent-specific 
data sets were calculated as 'HAP1 ' population types. 
The following parameters were used for the calcula- 
tion: Kosambi's mapping function, LOD > 1 .0, REC fre- 
quency <0.4, goodness of fit jump threshold for 
removal of loci = 5.0, number of added loci after 
which a ripple is performed = 1, and third round = 
yes. Each LG in parent-specific maps was named 
according to the following rule: numbers after LG 
(1-7) showed corresponding chromosome numbers 
of F. vesca predicted by comparative analysis 
(described below). The capital letters A-D distin- 
guished four LGs in each homeologous group (HG), 
which corresponded to LGs in an integrated map. 
The A-D suffixes corresponded to the lengths of the 
LGs on the integrated map. When multiple LGs were 
integrated into a single LG of the integrated map, 
they were numbered using capital letters with under- 
bars (e.g. LG1 B_1 and LG1 B_2). For unintegrated LGs, 
X or Y was used as a suffix. 

For construction of an integrated linkage map, cor- 
responding LGs between the parent-specific maps 
were assumed by identification of commonly 
mapped markers on each LG. Prior to integration, 
genotypes of dominant loci in each parent-specific 
map were imputed to co-dominant genotypes, 
according to the flanking genotypes of co-dominant 
loci. That is, alleles of dominant loci were converted 
to co-dominant alleles, according to the flanking gen- 
otypes of co-dominant loci. Parent-specific locus 
genotype data sets were then integrated into one 
dataset in each LG using the Combine Groups for 
Mapping Integration Module, followed by locus order- 
ing by the Regression Mapping Module of JoinMap. 
The parameters used for the mapping module of an 
integrated map were the same as those used for 
parent-specific mapping. After construction of an 
integrated map, the mapped loci were classified into 
two groups, i.e. loci generated from single-locus diag- 
nostic (SLD) and multi-loci diagnostic (MLD) markers. 
SLD markers were defined as markers that detected 
single segregation bands mapped onto single posi- 
tions on an integrated map, and not amplified other 



loci, such as monomorphic loci, whereas MLD 
markers were defined as markers that amplified 
more than one locus, including monomorphic loci. 

2.6. Comparative mapping 

Syntenic regions between the genomes of £ x ana- 
nassa and F. vesca were detected by identifying the 
conservation of the relative locations of genes and 
genomic regions. The source of the genome sequences 
of F. vesca was obtained from NCBI [Accession 
numbers: CM001 053.1 -CM001 059.1 (LG1-7), 
GG7751 83.1- GG775301.1 (unplaced)]. The EST 
sequences adjacent to the mapped markers on the 
integrated F. x ananassa map were compared with 
the gene sequences in the reference genomes using 
the BLASTX program with a cutoff E-value < 1e-10. 
The syntenic regions defined by the top hits of the 
homology search were plotted using the Cicros 
program (http://circos.ca). 

2.7 . Distinguishing the varieties of 129 F. x ananassa 
lines using selected EST-SSR markers 

In order to distinguish the varieties of 1 29 F. x ana- 
nassa lines, a total of 2 2 F. x ananassa varieties and 
breeding lines, most of which were bred in Japan, were 
used in pre-polymorphic analysis, including the follow- 
ing: 'Sachinoka', 'Fusanoka', Asuka Wave', 'Kaorino', 
Akihime', 'Miyoshi', 'Dover', 'Strawberry Parental Line 
Nou-2', 'Karenberry', 'Ohkimi', 'Sanchiigo', 'Toyonoka', 
'Nyohou', Tochiotome', 'Nou-Hime', Asuka ruby', 
'Sagahonoka', 'Beni Hoppe', 'Yayoihime', 'Sanukihime', 
'02-19', and '0212921'. PCR was performed with all 
the FVES and FAES markers developed in this study. 
The PCR protocol was same asthat used for polymorphic 
analysis of the DNA markers (see above). The PCR pro- 
ducts were separated by 1 0% polyacrylamide gel elec- 
trophoresis in TBE buffer using the standard protocol. 
SSR markers showing solid and polymorphic amplifica- 
tion were selected, and the robustness of PCR, i.e. the re- 
peatability of results and the absence of noise peaks, was 
investigated for the selected markers using a fragment 
analyzer (ABI 3730x1, Applied Biosystems, USA) for 22 
F. x ananassa lines. PCR was performed in a 5-(xl reac- 
tion volume using 1 ng of genomic DNA in 1X PCR 
buffer II (Applied Biosystems, USA), 2.5 mM MgCl 2 , 
0.1 25 U of AmpliTaq Gold DNA polymerase (Applied 
Biosystems, USA), 0.2 mM dNTPs, and 0.4 |jlM of each 
primer. The thermal cycling conditions were as follows: 
7 min denaturation at 95°C; 30 cycles of 30 s denatur- 
ation at 95°C, 30 s annealing at 57°C, and 1 min exten- 
sion at 72°C, with a final 1 0 min extension at 72°C. A 
total of 1 00 markers showing high repeatability were 
screened, and the robustness of PCR was again con- 
firmed for 1 29 F. x ananassa lines, 1 1 9 of which were 
previously investigated polymorphisms identified using 
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CAPS markers 44 (SupplementaryTableS2).The 45 best 
SSR markers were thereby selected to distinguish the 
varieties of 1 29 strawberry lines. The allelic data were 
converted into a binary matrix using the scores 1 or 0 
for the presence or absence of the peak. The expected 
heterozygosity (HZ) of each identified peak was calcu- 
lated using the following formula: 

HZi = 1-{Pf p + Pf 0 ), 

where P ip and P ia are the frequency of presence and 
absence of the ;th peak. In F. x ananassa, SSR markers 
often identify multi loci due to polyploidy, and it is 
often difficult to investigate the exact number of loci 
identified in each marker in an unrelated population. 
Therefore, the mean HZ value of identified peaks gener- 
ated by a marker was substituted for the HZ of each 
marker. The allelic binary data were also analysed 
using GGT 2.0 45 forthe investigation of genetic similar- 
ity between the varieties using a Jaccard similarity coef- 
ficient. A unweighted pair group method (UPGMA) with 
arithmetic average dendrogram was constructed using 
MEGA version 5.05. 46 



3. Results 

3.1 . Development of EST-SSR markers 

A total of 2 748 and 324 SSRs were identified in 
non-redundant 1 5 203 F. vesca and 1029 F.x ana- 
nassa ESTs, respectively. Of the identified SSRs, 562 
primers were designed based on the flanking regions 
of 453 F. vesca- and 109 F. x ananassa-denved EST- 
SSRs and designated FVES and FAES markers. To in- 
crease the number of EST-SSR markers, additional 
primer pairs were designed that allowed either 
single-base mismatches (779 and 107 primer pairs 
for FVES and FAES markers, respectively) or two-base 
mismatches (2514 and 387 primer pairs for FVES 
and FAES markers, respectively) in the SSR regions. 
Of the markers that were generated, those corre- 
sponding to previously published markers were 
excluded. 8 ' 1 0,15,17-23,27,32-39 As a a tota , of 

3746 FVES and 603 FAES markers were developed. 
Design details of the FVES and FAES markers, including 
primer sequences, corresponding SSR motifs, expected 
product sizes, and GenBank IDs of the template EST 
sequences, are listed in Supplementary Tables S3 
and S4 and on the web at http://marker.kazusa.or. 
jp/strawberry/. 

Of the FVES and FAES markers that were developed, 
2975 (79.4%) and 431 (71.5%) were trinucleotide 
repeats while 456 (1 2.2%) and 1 1 7 (1 9.4%) were di- 
nucleotide repeats, and 315 (8.4%) and 55 (9.1%) 
were tetranucleotide repeats, respectively (Table 1). 



In the FVES markers, the poly (AAG) n motif was 
most abundant in the trinucleotide repeats (755 
SSRs, 20.1%), followed by poly (GGA) n (492 SSRs, 
13.1%), poly (ATC) n (352 SSRs, 9.4%), poly (AG) n 
(347 SSRs, 9.3%), and poly (AGC) n (321 SSRs, 8.6%). 
Like the FVES markers, the poly (AAG) n motif was 
the most frequently observed motif in the FAES 
markers (132 SSRs, 21.9%). This was followed by 
poly (AG) n (81 SSRs, 13.4%), poly (ATC) n (68 SSRs, 
11.3%), and poly (AGC) n (48 SSRs, 8.0%). Among 
the tetranucleotide repeats, AT-rich motifs, namely 
poly (AAAG) n , and poly (AAAC) n , were more frequently 
observed than other motifs in both the FVES and FAES 
markers. 

3.2. Development of transcriptome-SSR markers 

A total of 1 1 88 2 26 SRA sequences were assembled 
into 80 430 contigs by the MIRA 3.2.0 program. 38 On 
these contigs, 34 993 SSRs were identified. Primers 
that targeted the flanking regions of 129 of the 
34 993 SSRs were designed and synthesized. All the 
1 29 SSRs were located on non-redundant contigs, 
and a total of 125 primers, which did not overlap 
with the FVES, FAES, and previously published 
markers, 8 ' 10 ' 15 '' 7 " 23 ' 2732 " 39 were designed as FATS 
markers. The primer sequences of the FATS markers, 
along with the corresponding SSR motifs, product 
sizes, primer sequences, and template contigs, are pro- 
vided on the web at http://marker.kazusa.or.jp/ 
strawberry/ and in Supplementary Table S5. Of the 
125 FATS markers, trinucleotide repeats were the 
most frequently observed (56 SSRs, 44.8%), followed 
by hexa nucleotide repeats (37 SSRs, 29.6%), and di- 
nucleotide repeats (25 SSRs, 20.0%, Table 1 ). No tetra- 
nucleotide repeats were observed, whereas seven 
pentanucleotide repeats were identified. Of the SSR 
motifs, the poly (AG) n motif was the most abundant 
(10 SSRs, 8.0%), followed by poly (TC) n (7 SSRs, 
5.6%), and poly (AAG) n (5 SSRs, 4.0%). In the penta- 
and hexa nucleotide repeats, each single SSR was iden- 
tified for all the observed motifs except poly 
(GCTGT) n (two SSRs, data not shown). 

3.3. Construction of parent-specific linkage maps 
3.3.1. The '0212921' S 7 mapping population A 

total of 4474 primer pairs of SSR markers, including 
3746 FVES, 603 FAES, and 125 FATS, were investi- 
gated with 8 randomly chosen St individuals of the 
'0212921' mapping population. Polymorphisms 
were observed on 605 FVES, 135 FAES, and 29 
FATS markers, and segregation analysis was per- 
formed for 169 St individuals with 769 primer 
pairs. A total of 881 segregation locus genotypes 
were generated from the 769 primer pairs, and 
822 of the 881 loci were mapped onto 34 LGs 
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Motif FVES FAES FATS 

Designed (%) Mapped (%) Designed (%) Mapped (%) Designed (%) Mapped (%) 

Dinucleotide 



AG 


347 


(9.3) 


1 20 


(1 1.0) 


81 


(1 3.4) 


34 


(17.7) 


1 0 


(8.0) 


4 


(10.3) 


AT 


60 


(1.6) 


21 


(1.9) 


22 


(3.6) 


1 2 


(6.3) 


0 


(0.0) 


0 


(0.0) 


AC 


49 


(1.3) 


1 8 


(1.7) 


1 4 


(2.3) 


5 


(2.6) 


1 


(0.8) 


0 


(0.0) 


TC 


0 


(0.0) 


0 


(0.0) 


0 


(0.0) 


0 


(0.0) 


7 


(5.6) 


4 


(10.3) 


Others 


0 


(0.0) 


0 


(0.0) 


0 


(0.0) 


0 


(0.0) 


7 a 


(5.6) 


0 


(0.0) 


Subtotal 


456 


(1 2.2) 


1 59 


(14.6) 


1 1 7 


(19.4) 


51 


(26.6) 


25 


(20.0) 


8 


(20.5) 


Trinucleotide 


























AAG 


755 


(20.1) 


210 


(1 9.3) 


1 32 


(21.9) 


41 


(21.4) 


5 


(4.0) 


1 


(2.6) 


GGA 


492 


(1 3.1) 


1 38 


(1 2.7) 


45 


(7.5) 


6 


(3.1) 


4 


(3.2) 


3 


(7.7) 


ATC 


352 


(9.4) 


1 04 


(9.6) 


68 


(1 1.3) 


1 6 


(8.3) 


2 


(1.6) 


1 


(2.6) 


AGC 


321 


(8.6) 


92 


(8.4) 


48 


(8.0) 


20 


(10.4) 


3 


(2.4) 


2 


(5.1) 


GGC 


291 


(7.8) 


77 


(7.1) 


27 


(4.5) 


1 1 


(5.7) 


1 


(0.8) 


0 


(0.0) 


GGT 


268 


(7.2) 


78 


(7.2) 


40 


(6.6) 


7 


(3.6) 


1 


(0.8) 


0 


(0.0) 


AAC 


207 


(5.5) 


58 


(5.3) 


34 


(5.6) 


1 1 


(5.7) 


0 


(0.0) 


0 


(0.0) 


ACG 


1 68 


(4.5) 


50 


(4.6) 


1 6 


(2.7) 


5 


(2.6) 


0 


(0.0) 


0 


(0.0) 


ACT 


62 


(1.7) 


8 


(0.7) 


5 


(0.8) 


0 


(0.0) 


0 


(0.0) 


0 


(0.0) 


AAT 


59 


(1.6) 


1 7 


(1.6) 


1 6 


(2.7) 


6 


(3.1) 


0 


(0.0) 


0 


(0.0) 


CTT 


0 


(0.0) 


0 


(0.0) 


0 


(0.0) 


0 


(0.0) 


4 


(3.2) 


1 


(2.6) 


Others 


0 


(0.0) 


0 


(0.0) 


0 


(0.0) 


0 


(0.0) 


36 a 


(28.8) 


1 2 


(30.8) 


Subtotal 


2975 


(79.4) 


832 


(76.4) 


431 


(71.5) 


123 


(64.1) 


56 


(44.8) 


20 


(51.3) 


Tetra nucleotide 
























AAAG 


100 


(2.7) 


37 


(3.4) 


1 5 


(2.5) 


4 


(2.1) 


0 


(0.0) 


0 


(0.0) 


AAAC 


54 


(1.4) 


1 4 


(1.3) 


7 


(1.2) 


1 


(0.5) 


0 


(0.0) 


0 


(0.0) 


AAAT 


43 


(1.1) 


1 1 


(1.0) 


1 4 


(2.3) 


6 


(3.1) 


0 


(0.0) 


0 


(0.0) 


GGGA 


26 


(0.7) 


8 


(0.7) 


3 


(0.5) 


2 


(1.0) 


0 


(0.0) 


0 


(0.0) 


AAGC 


21 


(0.6) 


6 


(0.6) 


2 


(0.3) 


1 


(0.5) 


0 


(0.0) 


0 


(0.0) 


AATC 


1 7 


(0.5) 


4 


(0.4) 


5 


(0.8) 


1 


(0.5) 


0 


(0.0) 


0 


(0.0) 


AATG 


1 5 


(0.4) 


7 


(0.6) 


1 


(0.2) 


1 


(0.5) 


0 


(0.0) 


0 


(0.0) 


Others 


39 


(1.0) 


1 1 


(1.0) 


8 


(1.3) 


2 


(1.0) 


0 


(0.0) 


0 


(0.0) 


Subtotal 


315 


(8.4) 


98 


(9.0) 


55 


(9.1) 


1 8 


(9.4) 


0 


(0.0) 


0 


(0.0) 


Pentanucleotide 
























Subtotal 


















7 a 


(5.6) 


1 


(2.6) 


Hexanucleotide 
























Subtotal 


















37 a 


(29.6) 


1 0 


(25.6) 


Total 


3746 


(100) 


1 089 


(100) 


603 


(100) 


1 92 


(100) 


1 25 


(100) 


39 


(100) 



a The numbers of types of observed 'other' SSR motifs in di- and trinucleotide repeats and all penta- and hexa- nucleotide 
repeats were 4, 24, 6, and 37, respectively. 



(Supplementary Tables S6 and S7). The length of 
each LG ranged from 1 .5 cM (LG4B) to 94.5 cM 
(LG1A), representing a total length of 1 508.3 cM. 
The mean locus density and segregation distortion 
(P<0.05) were 1 .83 cM locus" 1 and 22.4%, re- 
spectively, ranging from 0.22 cM locus -1 (LG6D) to 
4.26 cM locus" 1 (LG2A_2) and from 0.0% (LG4B) 
to 85.7% (LG5A_1), respectively. 



3.3.2. The '02-1 9' x 'Sachinoka' mapping 
population Polymorphic analysis was per- 
formed with a total of 5588 primer pairs of SSR 
markers, including 4474 SSR markers developed in 
this study and 1114 previously published markers 
(Supplementary Table S1). A total of 1 299 markers, 
i.e. 853 FVES, 131 FAES, 37 FATS, and 278 published 
markers, showed polymorphisms between the 
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Table 2. Number of mapped loci, length, locus density in an integrated map, and numbers of integrated LGs and loci of parental specific 
maps 



LG Interspecific map Parental specific maps 



'0212921' '02-19' 'Sachinoka' 'Kaorino' 'Akihime' 





Number of 
mapped loci 


Length 
(cM) 


Locus density 
(cM) 


Single 
loci 3 (%) 


LGs b 


Loci c 


LGs b 


Loci c 


LGs b 


Loci c 


LGs b 


Loci c 


LGs b 


Loci c 


1A 


91 


123.7 


1.36 


1 6 (1 7.5) 


1 


48 


1 


40 


1 


13 


1 


7 


2 


10 


1 B 


81 


80.7 


1.00 


25 (30.8) 


1 


25 


1 


41 


2 


42 


2 


1 4 


0 


0 


1C 


37 


77.2 


2.09 


1 1 (29.7) 


1 


24 


1 


9 


1 


1 3 


0 


0 


0 


0 


1 D 


24 


34.2 


1.43 


5 (20.8) 


1 


24 


0 


0 


0 


0 


0 


0 


0 


0 


2A 


123 


1 1 3.9 


0.93 


41 (33.3) 


2 


25 


2 


50 


1 


67 


1 


8 


2 


22 


2B 


71 


95.1 


1.34 


1 5 (21.1) 


1 


42 


2 


20 


1 


4 


1 


1 1 


1 


1 5 


2C 


55 


93.8 


1.71 


1 3 (23.6) 


1 


28 


0 


0 


1 


1 8 


1 


24 


0 


0 


2D 


64 


81.1 


1.27 


1 9 (29.6) 


1 


37 


1 


28 


1 


7 


1 


8 


0 


0 


3A 


1 14 


96.0 


0.84 


47 (41.2) 


1 


33 


2 


70 


2 


27 


1 


23 


1 


1 8 


3B 


89 


82.7 


0.93 


1 9 (21 .3) 


1 


40 


2 


37 


1 


24 


0 


0 


1 


1 1 


3C 


51 


70.8 


1.39 


1 3 (25.4) 


1 


36 


0 


0 


1 


1 3 


1 


1 1 


0 


0 


3D 


38 


65.8 


1.73 


5 (13.1) 


1 


30 


0 


0 


0 


0 


1 


9 


1 


3 


4A 


88 


1 1 2.2 


1.28 


31 (35.2) 


1 


24 


1 


45 


2 


1 9 


0 


0 


2 


42 


4B 


1 8 


91.4 


5.08 


2 (1 1 .1) 


1 


3 


1 


5 


1 


1 3 


0 


0 


0 


0 


4C 


64 


87.9 


1.37 


6 (9.37) 


2 


32 


2 


1 6 


1 


1 2 


1 


1 5 


1 


1 3 


4D 


45 


53.0 


1.1 8 


7 (1 5.5) 


1 


27 


0 


0 


1 


1 3 


2 


1 4 


1 


5 


5A 


1 30 


1 19.0 


0.92 


40 (30.7) 


2 


25 


2 


53 


2 


77 


2 


9 


1 


9 


5B 


57 


79.3 


1.39 


1 3 (22.8) 


1 


35 


0 


0 


0 


0 


1 


26 


2 


1 5 


5C 


1 8 


74.8 


4.1 6 


3 (1 6.6) 


0 


0 


1 


1 2 


0 


0 


0 


0 


1 


1 0 


5D 


23 


46.0 


2.00 


5 (21.7) 


1 


1 0 


0 


0 


0 


0 


1 


1 5 


0 


0 


6A 


131 


1 09.8 


0.84 


31 (23.6) 


3 


37 


4 


37 


2 


44 


3 


1 6 


4 


49 


6B 


1 29 


87.9 


0.68 


32 (24.8) 


2 


86 


1 


29 


2 


31 


2 


1 5 


2 


9 


6C 


59 


87.4 


1.48 


1 1 (1 8.6) 


1 


43 


1 


4 


1 


1 8 


1 


6 


0 


0 


6D 


49 


76.4 


1.56 


1 0 (20.4) 


1 


1 1 


0 


0 


1 


1 5 


2 


24 


2 


23 


7A 


80 


1 1 5.7 


1.45 


24 (30.0) 


1 


27 


2 


25 


1 


33 


1 


1 7 


1 


24 


7B 


37 


72.4 


1.96 


1 1 (29.7) 


1 


30 


0 


0 


0 


0 


1 


5 


1 


9 


7C 


24 


68.6 


2.86 


6 (25.0) 


1 


1 4 


0 


0 


2 


1 5 


0 


0 


0 


0 


7D 


66 


67.3 


1.02 


20 (30.3) 


1 


21 


2 


35 


2 


1 4 


1 


3 


2 


1 4 


Unintegrated 










1 


5 


3 


20 


4 


24 


4 


1 4 


5 


1 7 


Total 


1 856 


2364.1 


1.27 


481 (25.9) 


34 


822 


32 


576 


34 


556 


32 


294 


33 


318 



a Number of mapped loci generated from SLD markers. Numbers in parentheses show percentage of all the mapped loci. 
b Number of integrated LGs. 
c Number of integrated loci. 



parental lines. By performing segregation analysis of 
the 1 88 individuals of the mapping population, 1 078 
polymorphic loci were generated from 881 of the 
1299 primer pairs that were tested. Of the 1078 
segregation loci, 260 showed biparental polymorph- 
isms while the other 818 were parent specific. 
A total of 575 and 556 loci were mapped onto '02- 
19' and 'Sachinoka' specific maps, respectively 
(Supplementa ry Tables S6 and S7). In the '02-1 9' spe- 
cific map, 32 LGs were constructed on 1 668.9 cM,with 



lengths ranging from 2.0 cM (LG2X) to 1 29.2 cM 
(LG 1 A). The mean locus density and segregation distor- 
tion (P<0.05) were 2.90 cM locus" 1 and 30.2%, 
respectively, ranging from 0.34 cM locus -1 (LG5A_2) 
to 1 1 .63 cM locus" 1 (LG6C) and from 0.0% (LG3B_2, 
5A_2 and 7D_2) to 1 00% (LG2X and 4B), respectively. 
In the 'Sachinoka' map, 34 LGs were developed, total- 
ling 21 66.6 cM in length, with a mean locus density 
of 3.90 cM locus -1 . The length and locus density of 
each LG ranged from 1.6 cM (LG7C_2) to 120.0cM 
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(LG7A) and from 0.80 cM locus" 1 (LG7C_2) to 
1 6.08 cM locus -1 (LG3X), respectively. Segregation dis- 
tortion (P<0.05) of each LG ranged from 0.0% 
(LG7C_2, 7D_1 and 7X) to 85.7% (LG2X), with a 
mean value of 2 5.4%. 

3.3.3. The 'Kaorino' x 'Akihime' mapping 
population A total of 4474 primer pairs 
of SSR markers developed in this study were used for 
polymorphic analysis with the parental lines. 
Polymorphisms were observed in 438 FVES, 95 FAES, 
and 33 FATS markers, and segregation analysis was per- 
formed with the 140 mapping plants, fora total of 566 
primer pairs. A total of 61 2 segregation locus geno- 
types were generated from 53 7 of the 566 primer 
pairs that were tested. Of the 612 segregation loci, 
140 showed biparental polymorphisms while the 
other 472 were parent specific. A total of 294 and 
318 loci were mapped onto 'Kaorino' and 'Akihime' 
specific maps, respectively (Supplementary Tables S6 
and S7). The 'Kaorino' map consisted of 32 LGs, repre- 
senting a total length of 1 1 03.4 cM. The mean locus 
density and segregation distortion (P<0.05) were 
3.75 cM locus -1 and 24.8%, respectively, ranging 
from 0.45 cM locus -1 (LG1B_2) to 7.61 cM locus - " 
(LG1A) and from 0.0% (LG1 B_2, 1X, 2A, 2D, 4X, 
5A_1 , 5X, 6A_2, 6A_3 and 7D) to 1 00% (LG4D_2), re- 
spectively. In the Akihime' map, 33 LGs were devel- 
oped, with a total length of 951.4 cM and a mean 
locus density of 2.99 cM locus -1 . The length and 
locus density of each LG ranged from 2.2 cM (LG5X) 
to 77.7 cM (LG7A) and from 0.55 cM locus -1 
(LG5X) to 1 2.72 cM locus -1 (LG6A_4), respectively. 
Segregation distortion (P< 0.05) of each LG ranged 
from 0.0% (LG3X, 4A_2, 4C, 5A, 5C, 5X, 6A_1 , 6B_1 , 
6B_2, and 7D_2) to 100% (LG1A_1), with a 14.8% 
mean value. 



3.4. Construction of an integrated linkage map 

Prior to integration, the subsets of parent-specific 
LGs to be integrated were determined by the 
numbers of commonly mapped markers across 
parent-specific maps, that is, pairs of LGs showing 
the largest number of commonly mapped markers 
were assembled into the same subset. When one LG 
showed the same numbers of commonly mapped 
markers in different pairs, the LG was excluded from 
the integration. Locus genotype data of each subset 
of LG were integrated using the Combine Groups for 
Mapping Integration Module in JoinMap, and the 
locus orders of an integrated map were then calcu- 
lated. When grouped LG subsets were inadequate, 
the segregation loci of incorrect LGs were excluded 
during the process of locus ordering. When a LG 
subset was misassembled with incorrect pairs of 



parent-specific LGs, the loci of the mis-integrated LG 
overlapped with loci onto other integrated LGs 
which correctly paired LGs were integrated. 
Therefore, the correct assembly of a LG subset was 
determined by confirming the number of overlapping 
loci on each integrated LG using the Mapping module 
in JoinMap. 

The integrated linkage map consisted of 1 856 loci 
on 28 LGs, totalling 2364.1 cM in length (Table 2, 
Supplementary Table S7 and Fig. S1). The length of 
each LG varied from 34.2 (LG1 D) to 
123.7cM (LG1A). The mean locus density was 
1 .2 7 cM locus -1 , ranging from 0.68 cM locus -1 
(LG6B) to 5.08 cM locus -1 (LG4B). The largest gap 
was 27.3 cM, between FVES1 598_7a and 
FVES1351_7a on LG7A, followed by a 23.4 cM gap 
between FAES0326_4b and FAES3462_4b on LG4B, 
and a 21.6 cM gap between FVES0576_4c and 
FAES0045 on LG4C. The number of integrated LGs 
ranged from 1 (LG1D) to 16 (LG6A) parent-specific 
LGs (Table 2). The numberof unintegrated parent-spe- 
cific LGs, which totalled 1 7, ranged from 1 
('0212921') to 5 ('Akihime') in each parental map. 
The ratios of mapped FVES, FAES, and FATS loci were 
77.5, 1 5, and 3.4%, respectively (Supplementary Fig. 
S2). The ratio of mapped FVES markers in each LG 
ranged from 62.5 to 93.3%. 

Of the mapped loci, 481 (2 5.9%) were generated 
from SLD markers, whereas the other 1 375 were 
from MLD markers. The ratio of mapped loci gener- 
ated from SLD markers in each LG varied from 9.4% 
(LG4C) to 41.2% (LG3A). These loci were mapped 
randomly onto many of the LGs while several clusters 
were observed on parts of LGs (LG1 A, 1 C, 2C, 4C, 4D, 
6C, 6D, 7B, and 7C, Supplementary Fig. S1 ). Loci gen- 
erated from MLD markers were classified into four 
types; multi loci mapped onto homoeologous LGs 
(Multi_H in Supplementary Table S7), non-homeolo- 
gous LGs (Multi_NH), the same LGs (Multi_S), and 
single position (Multi_NM). For the Multi_NM loci, 
all the observed multiple bands were monomorphic 
except for mapped ones. Of the four types of multi 
loci, Multi_H was most frequently observed (684 
loci), followed by Multi_NM (523), Multi_S (206), 
and Multi_NH (77). Of the 684 Multi_H loci, 16 
and 99 were observed, corresponding to loci on 
non-homeologous LGs (Multi_H&NH) and the same 
LGs (Multi_H&S), respectively. The Multi_H loci were 
randomly distributed along the entire integrated 
linkage map, whereas Multi_S loci were not observed 
on several LGs, i.e. LG2B, 2D, 3C, 4D, 5A, 5B, 7B, and 
7C (Supplementary Fig. S1 ). In each HG, the homeolo- 
gous regions differed depending on the paired LGs 
(Supplementary Fig. S3). For example, most homeolo- 
gous regions of LG2A and LG2C were observed on 
20-40 cM while those of LG2B and LG2D were 
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identified on 60-80 cM. In the case of HG1 , homeo- 
logous regions were not observed between LG1 C and 
LG1 D. Such biases of homeologous pairs were identi- 
fied across the genome. 

3.5. Comparison with the genomes of a wild relative, 
F. vesca 

Of the 3746 ESTs that corresponded to the 
designed FVES markers, 3743 showed significant 
similarities to the genome sequences of F. vesca, 
while 3 ESTs, corresponding to FVES1248, 
FVES2637, and FVES2807, did not (Supplementary 
Table S3). Of the 3743 ESTs, 3608 showed similarities 
to genome sequences placed on 7 chromosomes of 
F. vesca, and the other 135 were mapped onto un- 
placed genomic scaffolds. In F. x ananassa sequence- 
derived markers, significant similarities to the 
F. vesca genome sequences were observed in all 603 
ESTs and 124 of the 125 transcript contigs from 
which FAES and FATS markers were designed, respect- 
ively (Supplementary Table S4 and S5). Of these, 593 
and 1 20 sequences corresponding to FAES and FATS 
markers, respectively, were mapped onto F. vesca 
genome sequences placed on the 7 chromosomes, 
whereas the other 14 ESTs or transcript contigs 
were mapped onto unplaced scaffolds. Similarity 
searches were also performed for the previously pub- 
lished markers that were located on the integrated 
map. Of the 91 mapped previously published 
markers, 68 ESTs or genome sequence are available 
on the NCBI dbGSS database (http://www.ncbi.nlm. 
nih.gov/projects/dbGSS, Supplementary Table S1) 
from which the markers were designed. Of the 68 
sequences, 62 showed significant similarities to the 
F. vesca genome sequences placed onto the 7 
chromosomes. 

A total of 1354 markers that generated 1783 
mapped loci onto the integrated linkage map showed 
significant similarity with F vesca genome sequences 
placed onto the 7 chromosomes. By considering the 
sequences with highest similarity scores to be putative 
orthologs, the map locations of the SSR markers and 
the corresponding F vesca genome sequences were 
compared. As shown in Fig. 1 , the alignment of hom- 
ologous sequence pairs along each LG revealed an 
obvious syntenic relationship between corresponding 
HGs in F x ananassa (Fa-HG) and chromosomes 
(Chr) in F vesca (Fv-Chr). Most of the regions did not 
show synteny between non-corresponding Fa-HGs 
and Fv-Chr, except between Fa-HG3 and 25-30 Mb 
on Fv-Chr2. In some LGs in F. x ananassa, syntenic 
regions were observed on whole corresponding chro- 
mosomes in F vesca, whereas segmental syntenic 
blocks were identified between pairs of the other Fa- 
LGs and Fv-Chr (Supplementary Fig. S3-1). For 



example, Fa-LG1A displayed syntenies against entire 
regions of Fv-Chr1, whereas segmental syntenic 
blocks were observed between 0 and 1 1 Mb in Fv- 
Chr1 and Fa-LG1B or Fa-LG1D, as well as between 8 
and 23 Mb in Fv-Chr1 and Fa-LG1C. 



3.6. Distinguishing the varieties of 1 29 F. x ananassa 
lines using selected EST-SSR markers 

To select the best marker set to determine the var- 
ieties of the F. x ananassa lines, polymorphic analysis 
of the 22 cultivated strawberry lines was performed 
using primer pairs targeting 3746 FVES and 603 
FAES markers. In the primer pairs of FVES markers, 
2949 resulted in solid amplification, whereas of the 
remaining primer pairs, 43 1 , 263, and 103 resulted 
in no amplification, multiple bands, and rare bands, 
respectively (Supplementary Table S3). In the FAES 
markers, solid amplification was observed for 460 
primer pairs, whereas of the remaining primer pairs, 
88, 52, and 3 resulted in no amplification, multiple, 
and rare bands, respectively (Supplementary Table 
S4). A total of 751 primer pairs, including 650 FVES 
and 1 01 FAES, showed solid and polymorphic amplifi- 
cation and were used for the subsequent investigation 
of PCR, i.e. the stability of PCR and detection of noise 
peaks, using a fragment analyzer for all 22 lines. 
A total of 1 00 of the 751 primer pairs showing high 
stability and few noise peaks were selected and used 
for polymorphic analysis with 129 strawberry lines 
to reconfirm the robustness of PCR in a more diversi- 
fied collection. From this experiment, the 45 best 
markers, which showed high stability in PCR and few 
noise peaks, were selected. PCR was then performed 
twice more with the 1 29 strawberry lines to exclude 
genotyping errors (Supplementary Table S3). 

The primer pairs of 45 selected SSR markers, includ- 
ing 4 FAES and 41 FVES markers, generated a total of 
158 peaks in the 129 strawberry lines that were 
tested (http://vim.kazusa.or.jp/Strawberry/). The 
number of identified peaks per a primer pair ranged 
from two to seven, with a mean value of 3.51 
(Supplementary Fig. S4). The HZ value of each identi- 
fied peak was calculated based on a score of 1 or 0 for 
the presence and absence of the peak and ranged 
from 0 to 0.5. The mean HZ value of each marker 
ranged from 0.1 1 to 0.43, with an average value of 
0.2 5. Similarity coefficients were used to examine 
the genetic relationships between the 1 29 strawberry 
lines. All possible genotypes showed similarity coeffi- 
cients ranging from 0.00 to 0.41 (Supplementary 
Table S2). No genetic differences were identified 
with the 45 markers between 'Nyoho' and 
'Shinnyoho' or between 'Himatsuri' and 'Toyonoka'. 
The highest similarity coefficient, 0.41, was found 
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F. X ananassa 




F. vesca 

Figure 1. Graphical view of syntenic relationship between F. x ananassa and F. vesca. Red and blue bars show LGs of F. x ananassa and 
chromosomes of F. vesca, respectively. Syntenic regions between the two species are connected by coloured lines. 



between the UK variety 'Serenata' and the Japanese 
variety 'Summer berry'. 



4. Discussion 

In this study, a total of 4474 SSR markers, includ- 
ing 3746 FVES, 603 FAES, and 125 FATS, were 
designed from public EST and transcript sequences 
of F. vesca and F. x ananassa. Of the 4474 SSR 
markers, 672 resulted in rare or no amplification, 
and the remaining 3802 SSR markers resulted in 
amplification. To our knowledge, 441 SSR markers 
have previously been published for the genus 
Fragflr/a. 8 ' 10 ' 15 ' 17 " 23 ' 27 ' 32 " 39 The number of SSR 
markers that were developed in this study is approxi- 
mately 8.6 times that of previously published 
markers. Recent advances in genome sequencing 
technology have enabled the large-scale development 
of single nucleotide polymorphism (SNP) markers in 



many of plant species. However, SNP discovery and 
genotyping are still difficult for polyploidy species 
because of the difficulty in distinguishing between 
homeologous allelic SNPs. Therefore, SSR markers 
are currently the most rapid and conclusive marker 
system for most polyploid species, including F. x ana- 
nassa. The numerous SSR markers developed in this 
study are an important resource for genetic and 
genomic studies of this species. 

In this study, linkage analysis was performed using 
a methodology previously employed in diploid and 
outcrossing species with the assumption that F. x ana- 
nassa is an allo-octoploid species with an AAA'A'BBB'B' 
genome structure. The large number of mapped 
loci that showed disomic inheritance supported this 
assumption. 

In the '02-1 9' x 'Sachinoka' and 'Kaorino' x 
Akihime' populations, polymorphic marker screening 
was only performed with parental lines that caused 
failure of screening polymorphic markers showing a 
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<ab x ab> segregation pattern. Meanwhile, eight 
randomly selected S-i individuals were used in poly- 
morphic marker screening of the '0212921' popula- 
tion, and all possible polymorphic markers were 
selected. Therefore, we considered that the large differ- 
ences of the number of mapped loci in the parent- 
specific maps were due to the steps in polymorphic 
marker screening and the genetic distances of haplo- 
types within parents. All the previously published 
maps in E x ananassa were derived from Ft mapping 
populations, and the '0212921' specific map is the 
first parent-specific map derived from an Si population. 
The density of the parent-specific map '0212921' 
suggests that an Si population is available for linkage 
map construction in E x ananassa. 

According to a previous study 10 and the integrated 
map developed in this study, the length of a saturated 
linkage map in F x ananassa would exceed 2 000 cM. 
The lengths of all the parent-specific maps were 
shorter than 2000 cM, except the 'Sachinoka' map 
that was 21 66.6 cM. In addition, no parent-specific 
map had 28 LGs, i.e. the number of chromosomes 
in the haploid genome of F. x ananassa. Therefore, it 
was concluded that all parental specific maps were 
unsaturated. 

The integrated linkage map comprises 1 856 loci on 
28 LGs. The total length ofthe map,2364 cM, is slight- 
ly longer than the previously reported densest map, 
2 1 40 cM, by Sargent et al. 1 0 The mean locus density 
of our integrated map was 1 .27 cM locus -1 , which 
was three times denser than that of the map of 
Sargent et al. Moreover, the integrated map presented 
in this study is the first linkage map derived from mul- 
tiple mapping populations of F. x ananassa, suggesting 
that the map reflects wider genetic diversity in the 
species. Several LGs of the integrated map seemed to 
be saturated, whereas some of others were not, such 
as LG1 D, LG4B, LG5C, LG5D, LG7B, and LG7C. In add- 
ition, the parent-specific LGs were not evenly merged 
into the LGs in the integrated map. For example, 
LG6A in the integrated map consists of 1 6 parent-spe- 
cific LGs, whereas LG1D was derived from a single 
parent-specific LG of the '021 2 921 '-specific map. 
One of the causes for the uneven positions of the 
mapped loci and the numberof merged parent-specif- 
ic LGs may be the bias of heterozygous (polymorphic) 
genomic regions within each parental line. Moreover, 
the source of EST-SSR markers might affect the even- 
ness ofthe mapping. Ofthe mapped loci on the inte- 
grated map, 82% were derived from F. vesca 
sequences, whereas 14 and 4% markers were gener- 
ated from F. x ananassa and other species, respectively. 
Previous molecular and cytogenetic studies suggested 
that F. vesca and F. iinumae or F. daltoniana were candi- 
date ancestral species of F. x ananassa.^ 3,1 4 This theory 
suggests that partial genome regions in F. x ananassa 



were derived from F iinumae or F. daltoniana and may 
show non-homology with the genomes of F. vesca. In 
addition, the ratio of mapped FVES loci ranged from 
62.5 to 93.3% (Supplementary Fig. S2). The large con- 
tribution of F vesca sequence-derived SSR markers on 
the integrated map may affect the biased position of 
the mapped loci. 

Although a total of 1 07 loci derived from previously 
published markers were mapped onto the integrated 
linkage map, we did not employ the corresponding 
names used in previously published maps. This was 
due to the insufficient number of commonly mapped 
markers across linkage maps, along with the large 
numberof mapped multi loci. Inthis study, A'to'D'suf- 
fixes were added to the LG names, which corresponded 
to the lengths ofthe LGs; A was used for the longest LG, 
whereas D was added to the shortest. Therefore, there 
are no biological relationships among the LGs desig- 
nated with the same capital letter suffix. We request 
the replacing of the capital letters as to represent the 
genome structure in F. x ananassa, for example 
AAA'A'BBB'B', after corresponding LGs to each 
genome, will be identified in future. 

Loci generated from SLD markers should play an im- 
portant role in the identification of specific LGs and 
chromosomes in F. x ananassa. In this study, we 
mapped both SLD and MLD markers, and 2 5.9% 
(481) of the mapped loci were derived from SLD 
markers. Indeed, the karyotypes of octoploid Fragaria 
species were already reported in F. chiloensis and F. vir- 
giniana, 47 as well as in F. x ananassa 48 Using genomic 
region mapped single loci as probes forcytological ana- 
lysis such as fluorescence in situ hybridization, corre- 
sponding karyotypes of F. x ananassa could be 
identified on each LG in the integrated map. In add- 
ition, the genome structure of F x ananassa could be 
resolved by comparative mapping with candidate an- 
cestral Fragaria species using markers detecting 
single loci. 

While 2 5.9% of the mapped loci were generated 
from SLD markers, the remaining 74.1 % (1 375) were 
derived from MLD markers, comprising 569 Multi_H, 
16 Multi_H&NH, 99 Multi H&S, 61 Multi_NH, 107 
Multi_S, and 523 Multi_NM. The ratio of all Multi_H 
(including Multi_H&NS and H&S) to the sum of 
mapped loci (except Multi_NM) was 0.51 [(569 + 
1 6 + 99)/(1 856-523)]; that is approximately half of 
the mapped loci on the integrated map were 
Multi_H. The Multi_H loci were mapped along the 
entire genomes. However, corresponding homeolo- 
gous positions differed depending on the paired LGs. 
The uneven homeologous regions between each pair 
of LGs might represent a feature of the genome com- 
position of F x ananassa. 

Comparative mapping studies between F x ana- 
nassa and F vesca have been reported. 8-11 According 
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to previous results, most of the genomic regions are 
conserved between the two species, with the rare ex- 
ception of chromosome rearrangement on HG1, 
HG3, and HG6. Results from our comparative 
mapping study generally agreed with those of previous 
studies, i.e. the identification of clear macrosynteny 
over both entire genomes. However, our results 
suggest that partial rearrangement occurred more fre- 
quently between homeologous genomes in F. x ana- 
nassa and F. vesca than previously reported. In 
addition, genomic rearrangement across HGs was first 
observed in this study, i.e. between Fv-Chr2 and Fa- 
HG3 (Fig. 1). Of the LGs belonging to HG3, LG3A, 3B 
and 3C showed the genomic rearrangement, whereas 
it was not observed on LG3D. 

The varieties of 1 29 F. x ananassa lines were distin- 
guished to demonstrate the practicable utility of the 
markers developed in this study. Ninety-one percent 
(118 of 12 9) of the tested lines were developed in 
Japan. Because most Japanese lines were developed 
from a limited number of ancestral lines, such as 
'Hawardl 7' and 'General Chanzy', genetic diversity is 
considered to be generally narrow in Japanese germ- 
plasm. 49 Despite the expected narrow genetic diversity, 
the 45 SSR markers employed in this study identified 
most of the tested lines except for 2 pairs, 'Nyoho' 
and 'Shin-Nyoho' and Toyonoka' and 'Himatsuri'. 
'Shin-Nyoho'and 'Himatsuri' were developed from mu- 
tation lines of 'Toyonika' and 'Nyoho', respectively. 
Therefore, it was assumed that the genetic diversities 
were quite narrow in each pair of undistinguished var- 
ieties. Meanwhile, it is notable that the 'Anter', 
'Pihyaradondon', and 'Nyoho' lines and the Akita 
Berry' and 'Morioka 1 6' lines could be distinguished 
from each other, although the former two and one of 
the varieties are mutant lines of 'Nyoho' and 'Morioka 
1 6', respectively. 

Variety identification is one of the major practical 
uses of DNA markers. Such identification should 
protect breeders' rights and inhibit the contamination 
of clonal seedlings during propagation. Because 
unknown samples are often used in variety identifica- 
tion, the accurate detection of targeted peaks is es- 
sential. In addition, it is well known that PCR results 
can vary depending on experimental conditions such 
as the type of thermal cycler and Taq polymerase 
employed during PCR analysis. In a previous study, 
Kunihisa 44 investigated the stability of polymorphic 
analysis of CAPS markers by comparing the results 
obtained in 14 laboratories to verify the adequacy 
of the makers used for variety identification tests in 
F. x ananassa. Govan et al. 50 also screened 32 SSRs 
that produced reliable PCR results for the identifica- 
tion of 60 varieties. In the present study, the 45 SSR 
markers were carefully screened by confirming the 
results of 3 independent PCRs. The 36 out of the 45 



markers were mapped onto a total of 20 LGs of the 
integrated map. It suggested that the polymorphisms 
of selected markers roughly reflect genetic diversity in 
entire genome. Therefore, the markers were found to 
be reliable for variety identification. 

In this study, we developed large numbers of SSR 
markers and constructed the densest linkage map for 
F. x ananassa. By performing comparative mapping 
and variety distinction, the resources that we devel- 
oped were shown to be useful for genetic and 
genomic analysis as well as practical applications. Our 
results should contribute to the acceleration of 
advances in the study of F. x ananassa, along with the 
genus Fragaria. 
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